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Abstract 

Crowdsourcing is now widely used to replace judgement or evaluation by an expert authority with 
an aggregate evaluation from a number of non-experts, in applications ranging from rating and catego- 
rizing online content all the way to evaluation of student assignments in massively open online courses 
(MOOCs) via peer grading. A key issue in these settings, where direct monitoring of both effort and 
accuracy is infeasible, is incentivizing agents in the 'crowd' to put in effort to make good evaluations, as 
well as to truthfully report their evaluations. We study the design of mechanisms for crowdsourced judge- 
ment elicitation when workers strategically choose both their reports and the effort they put into their 
evaluations. This leads to a new family of information elicitation problems with unobservable ground 
truth, where an agent's proficiency — the probability with which she correctly evaluates the underlying 
ground truth — is endogenously determined by her strategic choice of how much effort to put into the 
task. 

Our main contribution is a simple, new, mechanism for binary information elicitation for multiple 
tasks when agents have endogenous proficiencies, with the following properties: (i) Exerting maximum 
effort followed by truthful reporting of observations is a Nash equilibrium, (ii) This is the equilibrium 
with maximum payoff to all agents, even when agents have different maximum proficiencies, can use 
mixed strategies, and can choose a different strategy for each of their tasks. Our information elicitation 
mechanism requires only minimal bounds on the priors, asks agents to only report their own evaluations, 
and does not require any conditions on a diverging number of agent reports per task to achieve its 
incentive properties. The main idea behind our mechanism is to use the presence of multiple tasks and 
ratings to estimate a reporting statistic to identify and penalize low-effort agreement — the mechanism 
rewards agents for agreeing with another 'reference' agent report on the same task but also penalizes for 
blind agreement by subtracting out this statistic term, designed so that agents obtain rewards only when 
they put in effort into their observations. 

1 Introduction 

Crowdsourcing, where a problem or task is broadcast to a crowd of potential participants for solution, is used 
for an increasingly wide variety of tasks on the Web. One particularly common application of crowdsourcing 
is in the context of making evaluations, or judgements — when the number of evaluations required is too 
large for a single expert, a solution is to replace the expert by an evaluation aggregated from a 'crowd' 
recruited on an online crowdsourcing platform such as Amazon Mechanical Turk. Crowdsourced judgement 
elicitation is now used for a number of applications such as image classification and labeling, judging the 
quality of online content, identifying abusive or adult content, and most recently for peer grading in online 
education, where Massively Open Online Courses (MOOCs) with enrollment in the hundreds of thousands 
crowdsource the problem of evaluating assignments submitted by students back to the class itself. While one 
issue in the context of crowdsourcing evaluations is how best to aggregate the evaluations obtained from the 
crowd, there is also a key question of eliciting the best possible evaluations from the crowd in the first place. 

The problem of designing incentive mechanisms for such crowdsourced judgement elicitation scenarios 
has two aspects. First, suppose each worker has already evaluated, or formed a judgement on, the tasks 
allocated to her. Since the 'ground truth' for each task is unknown to the system, a natural solution is to 
reward workers based on other workers' reports for the same task (this being the only available source of 
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information about this ground truth|3- The problem of designing rewards to incentivize agents to truthfuhy 
report their observation, rather than, for example, a report that is more likely to agree with other agents' 
reports, is an information elicitation problem with unobscrvablc ground truth. Information elicitation has 
been recently been addressed in the literature in the context of eliciting opinions online (such as user opinions 
about products, or experiences with service providers); see t jl.H However, in those settings, agents (users) 
have already formed an opinion after receiving their signal (for example a user who buys a product forms 
an opinion about it after buying it) — so agents only need to be incentivized to incur the cost to report this 
opinion and find it more profitable to report their opinions truthfully than to report a different opinion. 

In the crowdsourcing settings we consider, however, the user does not have such a pre-formed, or expe- 
riential, opinion anyway, but rather forms a judgement as part of her task — further, the accuracy of this 
judgement depends on whether or not the agent puts in effort into it (for instance, a worker evaluating 
whether images contain objectionable content could put in no effort and declare all images to be clean, or 
put in effort into identifying which images are actually appropriate; a similar choice applies in other contexts 
like peer-grading). A key issue in these crowdsourced judgement elicitation scenarios is therefore incentiviz- 
ing efforjQ — that is, ensuring that agents make the best judgements that they possibly can (in addition, of 
course, to ensuring that they then truthfully report this observation). This leads to a new kind of informa- 
tion elicitation problem where an agent's proficiency now depends on her effort choice, and so is endogenous 
and unknown to the system — even if an agent's maximum proficiency is known, the actual proficiency with 
which she performs a task is an endogenous, strategic choice and therefore cannot be assumed as fixed or 
given. 

A mechanism for information elicitation in this setting should make it 'most beneficial', if not the only 
beneficial strategy, for agents to not just report their observations truthfully, but to also make the best 
observations they can in the first place. Also, it is even more important now to ensure that the payoffs from 
all agents always blindly reporting the same observation (for instance, declaring all content to be good) are 
strictly smaller than the payoffs from truthfully reporting what was actually observed, since declaring all 
tasks to be of some predecided type requires no effort and therefore incurs no cost, whereas actually putting 
in effort into making observations will incur a nonzero cost. Finally, unlike mechanisms designed for settings 
where a large audience is being polled for its opinion about a single event, a mechanism here must retain its 
incentive properties even when there are only a few reports per task — this is because it can be infeasible, 
due to either monetary or effort constraints, to solicit reports from a large number of agents for each task. 
(For example, the number of tasks in peer grading scales linearly with the number of agents, limiting the 
number of reports available for each task since each student can only grade a few assignments; similarly, the 
total cost to the requester in crowdsourcing platforms such as Amazon Mechanical Turk scales linearly with 
the number of workers reporting on each task) . How can we elicit the best possible evaluations from agents 
whose proficiency of evaluation depends on their strategically chosen effort, when the ground truth as well 
as the effort levels of agents are unobservable to the mechanism? 

Our Contributions. We introduce a model for information elicitation with endogenous proficiency, where 
an agent's strategic choice of whether or not to put in effort into a task endogcnously determines her pro- 
ficiency (the probability of correctly evaluating the ground truth) for that task. We focus on the design 
of mechanisms for binary information elicitation, i.e., when the underlying ground truth is binary (corre- 
sponding to eliciting 'good' or 'bad' ratings). While generalizing to an arbitrary underlying type space is an 
immediate direction for further work, we note that a number of interesting judgement and evaluation tasks, 
for example identifying adult content or correctness evaluation, are indeed binary; also, even very recent 
literature providing improved mechanisms for information elicitation (e.g. [?, ?]), as well as experimental 
work on the performance of elicitation mechanisms [?, ?], focuses on models with binary ground truth. 
Our main contribution is a simple, new, mechanism for binary information elicitation for multiple tasks 

^It is of course infeasible for a requester to monitor every worker's performance on her task, since this would be a problem 
of the same scale as simply performing all the tasks herself. We also note that a naive approach of randomly checking some 
subset of evaluations, either via inserting tasks with known responses, or via random checking by the requester, turns out to 
be very wasteful of effort at the scale neccessary to achieve the right incentives. 

■^We thank David Evans (VP Education, Udacity) for pointing out this issue in the context of peer-grading applications — 
while students might put in their best efforts on grading screening assignments to ensure they demonstrate the minimum 
proficiency required to be allowed to grade, how can we be sure that they will continue to work with the same proficiency when 
grading homeworks outside of this screening set? 
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when agents have endogenous proficiencies. Our mechanism has the following incentive properties. 

(i) Exerting maximum effort followed by truthful reporting of observations is a Nash equilibrium. 

(ii) This is the equilibrium with maximum payoff to all agents, even when agents have different maximum 
proficiencies, can use mixed strategies, and can choose a different strategy for each of their tasks. 

Showing that full-effort truthtelling leads to the maximum reward amongst all equilibria (including 
those involving mixed strategies) requires arguing about the rewards to agents in all possible equilibria 
that may arise. To do this, we use a matrix representation of strategies where every strategy can 
be written as a convex combination of 'basis' strategies, so that maximizing a function over the set 
of all possible strategies is equivalent to a maximization over the space of coefficients in this convex 
combination. This representation lets us show that the reward to an agent over all possible strategy 
choices (by herself and other agents), and therefore over all equilibria, is maximized when all agents 
use the strategy of full-effort truthful reporting. 

(iii) Suppose there is some positive probability, however small, that there is some 'trusted' agent for each 
task who will report on that task truthfully with proficiency greater than half. Then the equilib- 
rium where all agents put in full effort and report truthfully on all their tasks is essentially the only 
equilibrium of our mechanism, even i/thc mechanism docs not know the identity of the trusted agents. 

We note that our mechanism requires only minimal bounds on the priors and imposes no conditions on a 
diverging number of agent reports per task to achieve its incentive properties — to the best of our knowledge, 
previous mechanisms for information elicitation do not provide all these guarantees simultaneously, even 
when proficiency is not an endogenously determined choice (see i ^l.ll for a discussion) . 

The main idea behind our mechanism Ai is the following. With just one task, it is difficult to distinguish 
between agreement arising from high-effort observations of the same ground truth, and 'blind' agreement 
achieved by the low-effort strategy of always making the same report. We use the presence of multiple 
tasks and ratings to distinguish between these two scenarios and appropriately reward or penalize agents to 
incentivize high effort — A4 rewards an agent i for her report on task j for agreeing with another 'reference' 
agent rj(z)'s report on the same task, but also penalizes for blind agreement by subtracting out a statistic 
term corresponding to the part of i and rj(i)'s agreement on task j that is to be 'expected anyway' given 
their reporting statistics estimated from other tasks. This statistic term is chosen so that there is no benefit 
to making reports that are independent of the ground truth; the incentive properties of the mechanism follow 
from this property that agents obtain positive rewards only when they put effort into their evaluations. 

1.1 Related Work 

The problem of designing incentives for crowdsourced judgement elicitation is closely related to the growing 
literature on information elicitation mechanisms. The key difference between this literature (discussed in 
greater detail below) and our work is that agents in the settings motivating past work have opinions that 
are experientially formed anyway — independent, and outside of, any mechanisms to elicit opinions — so that 
agents only need be incentivized to participate and truthfully report these opinions. In contrast, agents in 
the crowdsourcing settings we study do not have such experientially formed opinions to report — an agent 
makes a judgement only because it is part of her task, expending effort to form her judgement, and therefore 
must be incentivized to both expend this effort and then to truthfully report her evaluation. There are also 
other differences in terms of the models and guarantees in previous mechanisms for information elicitation; 
we discuss this literature below. 

The peer-prediction method, introduced by Miller, Resnick and Zeckhauser [?], is a mechanism for the 
information elicitation problem for general outcome spaces where truthful reporting is a Nash equilibrium, 
uses proper scoring rules to reward agents for reports that are predictive of other agents' reports. The 
main difference between our mechanism and [?], as well as other mechanisms based on the peer prediction 
method [?, ?, ?, ?, ?], is in the model of agent proficiency. In peer-prediction models, while agents can 
decide whether to incur the cost to participate {i.e., submit their opinion), an agent's proficiency — the 
distribution of opinions or evaluations conditional on ground truth — is exogenously determined (and common 
to all agents, and in most models, known to the center). That is, an agent might not participate at all. 
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but if she does participate she is assumed to have some known proficiency. In contrast, in our setting, 
agents can choose not just whether or not to participate, but also endogenously determine their proficiency 
conditional on participating through their effort choice. Thus, while peer prediction mechanisms do need 
to incentivize agents to participate (by submitting a report), they then know the proficiency of agents 
who do submit a report, and therefore can, and do, dispense rewards that crucially use knowledge of this 
proficiency. In contrast, even agents who do submit reports in our setting cannot be assumed to be using their 
maximum proficiency to make their evaluations, and therefore cannot be rewarded based on any assumed 
level of proficiency. Additionally, truthtelling, while an equilibrium, is not necessarily the maximum- reward 
equilibrium in these existing peer-prediction mechanisms — [?] shows that for the mechanisms in [?, ?], the 
strategies of always reporting 'good' or always reporting 'bad' both constitute Nash equilibria, at least one of 
which generates higher payoff than truthtelling. Such blind strategy equilibria can be eliminated and honest 
reporting made the unique Nash equilibrium by designing the payments as in [?] , but this needs agents to be 
restricted to pure reporting strategies, and requires full knowledge of the prior and conditional probability 
distributions to compute the rewards. 

The Bayesian Truth Serum (BTS) [?] is another mechanism for information elicitation with unobservable 
ground truth. BTS does not use the knowledge of a common prior to compute rewards, but rather collects 
two reports from each agent — an 'information' report which is the agent's own observation, as well as a 
'prediction' report which is the agent's prediction about the distribution of information reports from the 
population — and uses these to compute rewards such that truthful reporting is the highest-reward Nash 
equilibrium of the BTS mechanism. In addition to the key difference of exogenous versus endogenous 
proficiencies discussed above, an important limitation of BTS in the crowdsourcing setting is that it requires 
the number of agents n reporting on a task to diverge to ensure its incentive properties. This n — oo 
requirement is infeasible in our setting due to the scaling of cost with number of reports as discussed in the 
introduction. [?] provides a robust BTS mechanism (RBTS) that works even for small populations (again 
in the same non-endogenous proficiency model as BTS and peer prediction mechanisms), and also ensures 
payments are positive, making the mechanism ex-post individually rational in contrast to BTS. However, the 
RBTS mechanism does not retain the property of truthtelling being the highest reward Nash equilibrium — 
indeed, the 'blind agreement' equilibrium via constant reports achieves the maximum possible reward in 
RBTS, whereas truthtelling might in fact lead to lower rewards. 

There is also work on information elicitation in conducting surveys and online polling [?, ?], both of 
which are not quite appropriate for our crowdsourcing setting. The mechanism in [?] is weakly incentive 
compatible (agents are indifferent between lying and truthtelling), while [?] presents a online mechanism 
that is not incentive compatible in the sense that we use and potentially requires a large (constant) number 
of agents to converge to the true result. For other work on information elicitation, albeit in settings very 
different from ours, see [?, ?, ?]. 

We note also that we model settings where there is indeed a notion of a ground truth, albeit unobservable, 
so that proficient agents who put in effort are more likely than not to correctly observe this ground truth. 
Peer-prediction methods, as well as the Bayesian truth serum, are designed for settings where there may be 
no underlying ground truth at all, and the mechanism only seeks to elicit agents' true observations (whatever 
they are) which means that some agents might well be in the minority even when they truthfully report 
their observation — this makes the peer prediction setting 'harder' along the dimension of inducing truthful 
reports, but easier along the dimension of not needing to incentivize agents to choose to exert effort to make 
high-proficiency observations. 

We note that our problem can also be cast as a version of a principal-agent problem with a very large 
number of agents, although the principal cannot directly observe an agent's 'output' as in standard models. 
While there is a vast literature in economics on the principal-agent problem too large to describe here (see, 
eg, [?] and references therein), none of this literature, to the best of our knowledge, addresses our problem. 
Finally, there is also a large orthogonal body of work on the problem of learning unknown (but exogenous) 
agent proficiencies, as well as on the problem of optimally combining reports from agents with differing 
proficiencies to come up with the best aggregate evaluation in various models and settings. These problems 
of learning exogenous agent proficiencies and optimally aggregating agent reports are orthogonal to our 
problem of providing incentives to agents with endogenous, effort-dependent proficiencies to elicit the best 
possible evaluations from them. 
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2 Model 



Wc now present a simple abstraction of the problem of designing mechanisms for crowdsourccd judgement 
elicitation settings where agents' proficiencies are determined by strategic effort choice. 

Tasks. There are m tasks, or objects, j = 1, . . . ,m, where each task has some underlying 'true quality', 
or type, Xj. This true type Xj is unknown to the system. We assume that the types are binary- valued: Xj 
is either H (or 1, corresponding to high-quality) or L (or 0, for low quality) for all j; we use 1 and H (resp. 
and L) interchangeably throughout for convenience. The prior probabilities of H and L for all tasks are 
denoted by P[H] and VIL]. We assume throughout that max{'P[H],V[L]) < 1, i.e., that there is at least 
some uncertainty in the underlying qualities of the objects. 

Agents. There are n workers or agents i = I, . . . ,n who noisily evaluate, or form judgements on, the 
qualities of objects. We say agent i performs task j if i evaluates object j. Agent i's judgement on task j is 
denoted by Xij £ {0, 1}, where Xij is if i evaluates j to be of type L and Xij is 1 if i evaluates it to be 
H. Having made an evaluation Xij, an agent can choose to report any value Xij G {0, 1} either based on, 
or independent of, her actual evaluation Xij. 

We denote the set of tasks performed by an agent i by J{i), and let denote the set of agents who 
perform task j. We will assume for notational simplicity that \ J{i)\ = D and |/(j)| = T for all agents i and 
tasks j. 

Proficiency. An agent's proficiency at a task is the probability with which she correctly evaluates its true 
type or quality. We assume that an agent's proficiency is an increasing function of the effort she puts into 
making her evaluation. Let etj denote agent i's effort level for task j: we assume for simplicity that effort 
is binary-valued, Cij G {0, 1}. Putting in effort has cost Cij(O) = 0, whereas putting in full effort has cost 
Cij(l) > (we note that our results also extend to a linear model with continuous effort where Cij G [0, 1] 
and the probability pi{eij) of correctly observing Xj as well as the cost Ci{eij) increase linearly with e^). 

An agent who puts in zero effort makes evaluations with proficiency Pij{0) = 1/2 and does no better 
than random guessing, i.e., Pr(Xy = Xj\eij = 0) = 1/2. An agent who puts in full effort = 1 attains 
her maximum proficiency, Pr(A'ij = Xj\eij = 1) = Pij(l) = Pi- Note that this maximum proficiency 
Pi can be different for individual agents modeling their different abilities, and need not be known to the 
center. We assume that the maximum proficiency Pi > for all i — this minimum requirement on agent 
ability can be ensured in online crowdsourcing settings by prescreening workers on a representative set of 
tasks (Amazon Mechanical Turk, for instance, offers the ability to prescreen workers [?, ?], whereas in peer- 
grading applications such as on Coursera, students are given a set of pre-graded assignments to measure 
their grading abilities prior to grading their peers, the results of which can be used as a prescreen.) 

We note that our results also extend easily to the case where the maximum proficiency of an agent depends 
on whether the object is of type H or L, i.e., the probabilities of correctly observing the ground truth when 
putting in full effort are different for different ground truths, Pr(Xy = Xj\Xj = H) ^ Pi{Xij — Xj\Xj = L) 
(of course, different agents can continue to have different maximum proficiences). 

Strategies. Agents strategically choose both their effort levels and reports on each task to maximize their 
total utility, which is the difference between the reward received for their reports and the cost incurred in 
making evaluations. Formally, an agent Vs strategy is a vector of D tuples [{eij,fij)], specifying her effort 
level Bij as well as the function fij she uses to map her actual evaluation Xij into her report Xij for each of 
her tasks. Note that since an agent's proficiency on a task pij is a function of her strategically chosen effort 
Cij, the proficiency of agent i for task j is endogenous in our model. 

For a single task, we use the notation {1,X) to denote the choice of full effort e^j = 1 and truthfully 
reporting one's evaluation {i.e., fij is the identity function Xij = Xij), {1,X'^) to denote full effort followed 
by inverting one's evaluation, and (0,r) to denote the choice of exerting no effort (e^ = 0) and simply 
reporting the outcome of a random coin toss with probability r of returning H. We use [(1,X)] to denote 
the strategy of using full effort and truthtelling on all of an agent's tasks, and similarly [(1, X'^)] and [(0, r)] 
for the other strategies. 

Mechanisms. A mechanism in this setting takes as input the set of all received reports Xij and computes 
a reward for each agent based on her reports, as well as possibly the reports of other agents. Note that 
the mechanism has no accestlfl to the underlying true qualities Xj for any task, and so cannot use the Xj 

^Crowdsourcing is used typically precisely in scenarios where the number of tasks is too large for the principal (or a sot of 
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to determine agents' rewards. A set of effort levels and reporting functions [(e,;j,/y)] is a full-information 
Nash equilibrium of a mechanism if no agent i can strictly improve her expected utility by choosing either 
a different effort level Cy , or a different function fij to map her evaluation Xij into her report Xij. Here, 
the expectation is over the randomness in agents' noisy evaluation of the underlying ground truth, as well 
as any randomness in the mechanism. 

We will be interested in designing mechanisms for which it is (i) an equilibrium for all agents to put in 
full effort and report their evaluations truthfully on all tasks, i.e., use strategies [{1,X)], and (ii) for which 
is the maximum utility (if not unique) equilibrium. We emphasize here that we do not address 
the problem of how to optimally aggregate the T reports Xij for task j into a final estimate of Xj: this 
is an orthogonal problem requiring application-specific modeling; our only goal is to elicit the best possible 
judgements to aggregate, by ensuring that agents find it most profitable to put in maximum effort into their 
evaluations and then report these evaluations truthfully. 

3 Mechanism 

The main idea behind our mechanism Ai is following. Recall that a mechanism does not have access to the 
true qualities Xj , and therefore must compute rewards for agents that do not rely on directly observing Xj . 
Since the only source of information about Xj comes from the reports Xij, a natural solution is to reward 
based on some form of agreement between different agents reporting on j, similar to the peer-prediction 
setting [?]. However, an easy way for agents to achieve perfect agreement with no effort is to always report 
H (or L). With just one task, it is difficult for a mechanism to distinguish between the scenario where agents 
achieve agreement by making accurate, high-effort, evaluations of the same ground truth, and the low-effort 
scenario where agents achieve agreement by always reporting H, especially if V[H] is high. However, in our 
setting, we have the benefit of multiple tasks and ratings, which could potentially be used to distinguish 
between these two strategies and appropriately reward agents to incentivize high effort. 

Ai uses the presence of multiple ratings to subtract out a statistic term Bij from the agreement score, 
chosen so that there is no benefit to making reports that are independent of Xj — roughly speaking, Ai 
rewards an agent i for her report on task j for agreeing with another 'reference' agent rj(i)'s report on the 
same task, but only beyond what would be expected if i and rj(i) were randomly tossing coins with their 
respective empirical frequencies of heads. 

Let d denote the number of other reports made by i and rj(i) that are used in the computation of this 
statistic term Bij based on the observed frequency of heads for each pair (i, j). We use Add to denote the 
version of Ai which uses d other reports from each of i and rj[i) to compute Bij. To completely specify Add, 
we also need to specify a reference rater rj[i) as well as this set of d (non-overlapping) tasks performed by i 
and rj{i), for which we use the following notation. (We require these d other tasks to be non-overlapping so 
that the reports for these tasks Xi^ and X^^fj); are independenl|f|, which is necessary to achieve the incentive 
properties oi Add-) 

Definition 1 (Sij, Sr {i)j)- Consider agent i and task j G J{i), and a reference rater rj{i). Given a value 
of d (1 < d < D — 1), let Sij and Srj{i)j be sets of d non-overlapping tasks other than task j performed by i 
and rj{i) respectively, i.e., 

S.tjCJ{i)\j, Srj{t)j C J{rj{i))\j, 5y n S',.^.(j)j = 0, \S.ij\ = \Srj(t)j\ = d. 

A mechanism Add is completely specified by reference raters rj{i) and the sets Sij and Sj. (i)j, and 
rewards agents as defined below. Note that Aid only uses agents' reports Xij to compute rewards and not 
their maximum proficiencies pi, which therefore need not be known to the system. 



trusted agents chosen by the principal) to carry out herself, so it is at best feasible to verify the ground truth for a tiny fraction 
of all tasks, which fraction turns out to be inadequate (a formal statement is omitted here) to incentivize effort using knowledge 
of the Xj . 

*We assume that co-raters' identities are kept unknown to agents, so there is no collusion between i and rj{i). 
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Definition 2 (Mechanism Add)- Aid computes an agent i's reward for her report Xij G {0,1} on task j, 
Rij , by comparing against a 'reference rater' rj{i)'s report Xrj[i)j forj, as follows: 

Rij — Aij — Bij, where (1) 

'^'^ d d d d ' 

where the sets Sij and Sr {i)j in Bij are as in Definition [IJ The final reward to an agent i is PRi , where 
Ri = X]jG7(i) ^ij ^'^'^ P simply a non-negative scaling parameter that is chosen based on agents' costs of 
effort. 

The first term, Aij, in Rij is an 'agreement' reward, and is 1 when i and rj{i) both agree on their report, 
i.e., when Xij = X^^^j-jj = 1 or when Xij = X^.(^i-^j = 0. The second term Bij is the 'statistic' term which, 
roughly speaking, deducts from the agreement reward whatever part of i and rj(i)'s agreement on task j is to 
be 'expected anyway' given their reporting statistics, i.e., the relative frequencies with which they report H 
and L. This deduction is what gives Ai its nice incentive properties — while A4 rewards agents for agreement 
via Aij, Ai also penalizes for blind agreement that agents achieve without effort, by subtracting out the Bij 
term corresponding to the expected frequency of agreement if i and rj(i) were randomly choosing reports 
corresponding to their estimated means. 

For example, suppose all agents were to always report H. Then Aij is always 1, but Bij = 1 as well so 
that the net reward is 0; similarly if agents chose their reports according to a random cointoss, even one 
with the 'correct' bias VlH], the value of Aij is exactly equal to Bij since there is no correlation between 
the reports for a particular task, again leading to a reward of 0. The reward function Rij is designed so that 
it only rewards agents when they put in effort into their evaluations, which leads to the desirable incentive 
properties of A^^- (We note that there are other natural statistics which might incentivize agents away from 
low-effort reports — e.g., rewarding reports which collectively have an empirical mean close to VlH], or for 
variance. However, it turns out that appropriately balancing the agreement term (which is necessary to 
ensure agents cannot simply report according to a cointoss with bias VlH]) with a term penalizing blind 
agreement to simultaneously ensure that [(1, X)] is an equilibrium and the most desirable equilibrium is hard 
to accomplish.) 

There are two natural choices for the parameter d, i.e., how many reports of i and rj{i) to include 
for estimating the statistic term that we subtract from the agreement score in (i) In Adjj^i, we set 

d = D — 1 and include all reports of agents i and rjii), except those on their common task j. Here, the 
non-overlap requirement for sets Sij and Sr -{i)j says that an agent i and her reference rater rj{i) for task 
j have only that task j in common, (ii) In A4i, we set d — 1, i.e., subtract away the correlation between 
the report of i and rj{i) on exactly one other non-overlapping task. In A^i, the non-overlap condition only 
requires that for each agent-task pair, there is a reference agent rj{i) available who has rated one other task 
that is different from the remaining tasks rated by i, a condition that is much easier to satisfy than that in 
Aio-i- In §131 we will see that A4i will require that the choices of where {j'} = Sij is the task used 

in the statistic term of Vs reward for task j, are such that each task j' performed by i is used exactly once 
to determine Rij for j ^ j' . Note that this is always feasible, for instance by using task j + 1 in the statistic 
term for task j ioi j = 1, . . . , D — 1 and task 1 for task D. 



4 Analyzing Ai 

In this section, we analyze equilibrium behavior in A4d. We begin with some notation and preliminaries. 



4.1 Preliminaries 

Recall that proficiency is the probability of correctly evaluating the true quality. We use p[H] (respectively 
p[L]) to denote the probability that an agent observes H (respectively L) when making evaluations with 

^Understanding the effect of the parameter d in our mechanisms, which appears irrelevant to the mechanism's behavior when 
agents are risk-neutral, is an interesting open question. 
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proficiency p, i.e., the probability that Xij = iJ is p[H] = pV[H] + (1 — p)V[L]. Similarly, (7[H],(7[L] and 
Pi[H],Pi[L\ correspond to the probabilities of seeing H and L when making evaluations with proficiencies q 
and Pi respectively. 



Matrix representation of strategies. We will frequently need to consider the space of all possible strategies an 
agent may use in the equilibrium analysis of A^^^. While the choice of effort level in an agent's strategy 
[{eij,fij)] is easily described — there are only two possible effort levels 1 and — the space of functions 
through which an agent can map her evaluation Xij into her report Xij is much larger. For instance, an 
agent could choose fij corresponding to making an evaluation, performing a Bayesian update of her prior 
on Xj, and choosing the report with the higher posterior probability. We now discuss a way to represent 
strategies that will allow us to easily describe the set of all reporting functions fij . 

An agent i's evaluation Xij can also be written as a two-dimensional vector o*-' G M^, where o'^ = [l O] ^ 

if i observes a H, and o*^ = [O l] if i observes a L, where denotes the transpose of a. For the purpose 
of analyzing Add, any choice of reporting function fij can then be described via a 2 x 2 matrix 



X 

1-x 



y 



where x is the probability with which i chooses to report H after observing H, i.e., x — PT{Xij ~ H\Xij = 
H), and similarly y = 'Pi{Xij = L\Xij = L). Observe that the choice of effort affects only o*-' and its 
'correctness', or correlation with the (vector representing the) actual quality Xj, and the choice of reporting 
function fij only affects AP^ . 

Any reporting matrix AP^ of the form above can be written as a convex combination of four matrices — 



one for each of the fij corresponding to (i) truthful reporting {Xij = 
(iii, iv) always reporting H or L independent of one's evaluation {Xij 



Xij) (ii) inverting {Xij = Xfj), and 
= H and Xij = L respectively): 



Mx = 



1 
1 



1 

1 



1 1 




,AIl = 





1 1 



That is, AI'^ = oi\Alx + aiAlx" + ctaAIn + a^AlL, where ai ^ x — a^, a2 ^ 1 — y — a^, and ~ x ~ y and 
04 = if X > y, and as = and ~ y — x if y > x. It is easily verified that > 0, and ^ a; = 1, so that 
this is a convex combination. Since all possible reporting strategies fij can be described by appropriately 
choosing the values of a; G [0, 1] and y G [0, 1] in AP^ , every reporting function fij can be written as a convex 
combination of these four matrices. 



The agent's final report Xij 
probability that i reports H, i.e. 



is then described by the vector M'^^d^^ G M^, where the first entry is the 
Xij = H, and the second entry is the probability that she reports Xij = L. 
The expected reward of agent i for task j can therefore be written using the matrix-vector representation 
(where ^ denotes transpose and 1 is the vector of all ones) as 

E[R,j] = E[{AP^^''>^o''^^''>^fAP'o'^ + (1 - M'-^(»)Jo''^(')^)^(l - AP^o'^)] 

- [{W^^'^' E[o'''^'^i]f AP^ E[d'^] + (1 - Ar^(')^i;[o''^('''^])^(l - ApiE[d^])], 

which is linear in AP^ . So the payoff from an arbitrary reporting function fij can be written as the corre- 
sponding linear combination of the payoffs from each of the 'basis' functions (corresponding to AIx, AIx", AIh 
and AI]^) constituting = AI^P We will use this to argue that it is adequate to consider deviations to each 
of the remaining basis reporting functions and show that they yield strictly lower reward to establish that 
[(1,X)] is an equilibrium of Aid. 



Equivalent strategies. For the equilibrium analysis, we will use the following simple facts, (i) The strategy 
(0,X) {i.e., using zero effort but truthfully reporting one's evaluation) is equivalent to the strategy (0,r) 
with r = 1/2, i.e., to the strategy of putting in no effort, and randomly reporting H or L independent of the 
evaluation Xij with probability 1/2 each, (ii) The strategy (l,r) is equivalent to the strategy (0, r), since 
the report Xij in both cases is completely independent of the evaluation Xij and therefore of eij. 
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Cost of effort. While agents do incur a higher cost when using Cij = 1 as compared to = 0, we will not 
need to explicitly deal with the cost in the equilibrium analysis — if the reward from using a strategy where 
eij = 1 is strictly greater than the reward from any strategy with Cij = 0, the rewards Rij can always be 
scaled appropriately using the factor (3 (in Definition [2]) to ensure that the net utility (reward minus cost) is 
strictly greater as well. 

We remark here that bounds on this scaling factor (3 could be estimated empirically without requiring 
knowledge of the priors by estimating the cost of effort Cy- from the maximum proficiencies obtained from a 
pre-screening (U2]), by conducting a series of trials with increasing rewards and then using individual ratio- 
nality to estimate the cost of effort from observed proficiencies in these trials. 

Individual rationality and non- negativity of payments. The expected payments made by our mech- 
anism to each agent arc always nonnegative in the full-effort truthful reporting equilibrium, i.e., when all 
agents use strategies [(1,X)]. To ensure that the payments are also non-negative for every instance (of the 
tasks and reports) and not only in expectation, note that it suffices to add 1 to the payments currently 
specified, since the penalty term Bij in the reward Rij is bounded above by 1. We also note that individual 
rationality can be achieved by using a value of (3 large enough to ensure that the net utility PRij — c(l) 
remains non-negative for all values of V[H] — while the expected payment Rij does go to zero as V[H] tends 
to 1 {i.e., in the limit of vanishing uncertainty as the underlying ground truth is more and more likely to 
always be H (or always be L)), as long as there is some bound e > such that max{'P[i?], ^[L]} < 1 — e, a 
simple calculation can be used to determine a value /3*(e) such that the resulting mechanism with /? = /?* 
leads to nonnegative utilities for all agents in the full-effort truth-telling Nash equilibrium of A4. 

4.2 Equilibrium analysis 

We now analyze the equilibria of Aid- Throughout, we index the tasks J{i) corresponding to agent i by 

je{i,...,D}. 

First, to illustrate the idea behind the mechanism, we prove the simpler result that [(1,X)] is an equi- 
librium of M. when agents all have equal proficiency pi ~ p, and are restricted to choosing one common 
strategy for all their tasks. 

Proposition 3. Suppose all agents have the same maximum proficiency p, and are restricted to choosing 
the same strategy Jor each of their tasks. Then, all agents choosing [{\,X)\ is an equilibrium of Aid for all 
d, if p^ 1/2. 

Proof. Consider an agent i, and suppose all other agents use the strategy {1,X) on all their tasks. As 
discussed in the preliminaries, an agent's reward is linear in her reports fixing the strategies of other agents, 
so it will be enough to show that there is no beneficial deviation to {1,X'^), or (0,r) for any r e [0, 1] to 
establish an equilibrium, (as noted earlier, the choice of effort level is irrelevant when reporting Xij according 
to a the outcome of a random coin toss independent of the observed value of Xij). The reward to agent i 
when she uses strategy [(1,X)] is 

D 

Em{l,X))] ^Y.^[^^^^r,i^)J + (1 - X,,){1 - X,^(,),)] 
J = l 

- ^ [ d d + d d 

= D[p' + (f -p)' - {p[H]' + {l-p[H]f}] 
= D{p - p{H)){p ~ p{L)) 
= D[2p-lfr[H]V[L], 

which is strictly positive if p ^ f/2 and imn{V[H],'P[L\) > (as assumed throughout), where we use 
p — p[H\ = {2p~ 1)V[L\ andp — = {2p— 1)V\H]. The expected reward from deviating to {1,X'^), when 
other agents are using (1,X) is 

E[R,{{1, X^))] = D {2p[l -p)- 2p[H][l - p[H])) = -Dip - p{H)){p - p{L)). 
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Therefore, the expected reward from deviating to {1,X'^) is negative and strictly smaller than the reward 
from {1,X) if p 7^ 1/2. Finally, suppose agent i deviates to playing (0,r), i.e., reporting the outcome of a 
random coin toss with bias r as her evaluation of Xij. Her expected reward from using this strategy when 
other agents play according to {1,X) is 

E[R,{{0, r))] = D {rp[H] + (1 - r)p[H] - {rp[H] + (1 - r)(l - p[H]))) = 0. 

(In fact, if either agent reports ratings on her tasks by tossing a random coin with any probability r G [0, 1], 
independent of the underlying true realization of Xij, the expected reward to agent i is 0.) Therefore, if 
p ^ 1/2, deviating from (1,^) leads to a strict decrease in reward to agent i. Hence, the rewards can 
always be scaled appropriately to ensure that [(1,X)] is an equilibrium of M. for any values of the costs 
c,. □ 

We will now move on to proving our main equilibrium result for Aid^ where agents can have different 
maximum proficiencies, as well as possibly use different strategies for different tasks. We begin with a 
technical lemma and a definition. 

Lemma 4. Let f^{p,q) ^ pq + {I - p){l - q) - a{p[H]q[H] + {I - p[H]){l - q[H])). If a < 1, ft) Up,q) 
is strictly increasing in p if q > 1/2, and strictly increasing in q if p > 1/2. (ii) fa{p,q) is nonnegative if 
p,q > 1/2, and positive if p, q > 1/2. (Hi) Denote f{p, q) — fi{p, q). Then, f{p,q) ~ f{<l,p) ~ /(I ^Pi 1 ^ 'z)- 
Also f{p,l-q) = f{l-p,q) = -f{p,q). 

Proof. RecaU that p[H] = pV[H] + {I - p)'P[L\, and similarly for q[H]. 

fc.{p, q) = p{2q - 1) + (1 - g) - a{pV[H] + (1 - p)P[L]){2q[H] - 1) - (1 - q[H]) 
= p [(2q - 1) - a{V[H] - V[L]){2q[H] - 1)] + K^^ 
= p{2q ~ 1)(1 - a{P[H] - P[L]f) + K-p, 

where K-p is a term that does not depend on p, and we use 2q[H] — 1 = {2q — 1){V[H] — V[L]) in the last 
step. 

Note that P[H] - V[L] < V[H] < 1 if iTiiix{V[H],V[L]) < 1, so that 1 - aiV[H] - V[L]f > if a < 1. 
Therefore, faip,q) is linear in p with strictly positive coefficient when g > 1/2 and a < 1. An identical 
argument can be used for q since fa {p, q) can be written as a linear function of q exactly as for p: 

fa{p, q) = q{2p - 1)(1 - a{V[H] - V[L]f) + K.g. 

This proves the first claim. 

For nonnegativity of fa{p,q) on p G [1/2,1], we simply argue that fi{p,q) is increasing in q when 
p G [1/2, 1], and at q = 1/2. So for any q > 1/2, fi{p,q) > for any p G [1/2, 1]. But fa{p,q) is decreasing 
in a, so faip,q) is nonnegative for any a < 1 as well. 

The final claims about f{p, q) and /(I — p, q) can be verified just by substituting the definitions of p[H] 
and q[H] and from symmetry in p and q. □ 

Definition 5 {Tij, dij). Let Tij be the set of all tasks j' ^ j such that j G Siji , i.e., Tij is the set of tasks 
j' for which i 's report on task j is used to compute the statistic term of i 's reward R^ji for task j' . We use 
dij ~ \Tij\ to denote the number of such tasks j' . 

Our main equilibrium result states that under a mild set of conditions on the choice of reference raters 
rj{i) and sets Tij, exerting full effort and reporting truthfully on all tasks is an equilibrium of Aid — even 
when agents have different maximum proficiencies and can choose a different strategy for each task (for 
instance, an agent could choose to shirk effort on some tasks and put in effort on the others). The main 
idea behind this result can be understood from the proof of Proposition p-easy above, where all agents had 
the same maximum proficiency pi — p and were restricted to using the same strategy for each task. There, 
the expected payoff from using [(1,X)] is exactly f{p,p) where / is as defined in LcmmaUl while the payoff 
from playing [(0, r)] is (independent of other agents' strategies); the payoff from deviating to [(1, X'^)] when 
other agents play [(1, X)] is ~f{p,p). Since f{p,p) > for p > 1/2 and increases with p, it is a best response 
for every agent to attain maximum proficiency and truthfully report her evaluation. 
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Extending the argument when agents can have both different maximum proficiencies pi and use different 
strategy choices for each task requires more care, and are what necessitate the conditions on the task 
assignment in Theorem [5] below. We note that these conditions on Md arise because of the gcncrahzation to 
both differing abilities pi and being allowed to choose a different strategy for each task — if either generalization 
is waived, i.e., if agents can choose different strategics per task but all have equal ability {pi = p), or agents 
can have different abilities pi but are restricted to choosing the same strategy for all their tasks, [(1, X)] can 
be shown to be an equilibrium of A^^ even without imposing these conditions. 

Theorem 6. Suppose pi > 1/2 for all i, and for each agent i, for each task j G J{i), (i) dij = d, and (ii) 
E[Prj{i)] = -E'iieTij [Pr^-j (i)] — Pi; whcre the expectation is over the randomness in the assignment of reference 
raters to tasks and the sets Tij. Then, [(1,X)] is an equilibrium of Aid. 

The first condition in Theorem [51 dij = d, says that each task j performed by an agent i must contribute 
to computing the reward via the statistic term for exactly d other tasks in J(i), where d is the number 
of reports used to compute the 'empirical frequency' of H reports by i in the statistic term. The second 
condition = Ej^^Tij [Prj^ (i)] says that an agent i should expect the average proficiency of her reference 

rater rj{i) to be equal for all the tasks that she performs, i.e., agent i should not be able to identify any 
particular task where her reference raters are, on average, worse than the reference raters for her other tasks 
(intuitively, this can lead to agent i shirking effort on this task being a profitable deviation). The first 
condition holds for each of the two specific mechanisms A4i and A4d-i, and the second condition can be 
satisfied, essentially, by a randomization of the agents before assignment, as described in ij5l We now prove 
the result. 

Proof. Consider agent i, and suppose all other agents use strategy [(1,X)], i.e., put in full effort with 
truthtclling on all their tasks. It will be enough to consider pure strategy deviations, and show that there is 
no beneficial deviation to (1, X'^), or (0, r) for any r G [0, 1] on any single task or subset of tasks. 

First, consider a particular assignment of reference raters rj{i) and the sets Sij (and therefore Tij). The 
total expected reward to agent i from all her D tasks in this assignment, when other agents all play according 
to [(1,X)] is 

D 

E[Ri] = ^E[XijXr^(i)j + (1 - Xij){l -Xr^(i)j)] - 
J = l 

D 

= 5Z E[^ij^r-j(i)j + (1 - Xij){l - Xr-(i)-j)] - 
J = l 

= E + (1 - X,,)(l - - J2 i^^i^Pr^M^] - 1)) - (1 -^,w[i/])], 

where the expectation is over any randomness in the strategy of i as well as randomness in i and fj(j)'s 
evaluations for each task j, and we rearrange to collect Xij terms in the last step. 

Now, agent i can receive different reference raters and task sets Sij in different assignments. So to 
compute her expected reward, agent i will also take an expectation over the randomness in the assignment 
of reference raters to tasks and the sets Sij , which appear in the summation above via Tij . 

Recall the condition that = Ej^^Xi, [Prj^{i)] — Pi- Using this condition and taking the expectation 

over the randomness in the assignments of rj{i) and Sij, the expected reward of i is 

D 

Em = E [x^jXr.w, + (1 - x,,){i - 
j=i 

D 

= ^ [X,jXr.(i)j + (1 - Xij){l - Xr.(i)j)] 

J = l 

where p^[H] = E[prj(^i)[H]] = Ej^f.^, [pr,^(t)[H]]. 



)(1 



[H]) 



(2p,^„[iJ]-l) + (l-^^.(,)[7?]) 



_^£[X,,](2p,[/J]-1)-(1-p4//1)], 
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The expected reward to agent i, when she makes evaluations with proficiency qj for task j and truthfully 
reports these evaluations {Xij = Xij), is then 

D , 

Em = E + (1 - - p») - -fiAHmm - 1) - (i - pm)] 

E + (1 - -pO - ^ fe[^j]p4^^] + (1 - g[//])(i - (1 - ^)ii~pm)] 



j=i 

D 

(3) 



E 

3 = 1 



where the expectation is taken over randomness in all agents' evaluations, as well as over randomness in 
the choices of (i) and Sij . 

We can now show that choosing full effort and truthtelling on all tasks is a best response when all other 
agents use [(1,X)] if dij ~ d. First, by Lemma 21 fd^j {qj,pi) is increasing in qj provided < 1, so agent 

i should choose full effort to achieve her maximum proficiency pi on all tasks. Next, note that in terms 
of the expected reward, using proficiency qj and reporting X"^ is equivalent to using proficiency 1 — qj and 
reporting X. So again by Lemma 21 deviating to X"^, i.e., (1 — qj), on any task is strictly dominated by X 
for gj > 1/2 and pi > 1/2. 

Finally, if agent i chooses fij as the function which reports the outcome of a random cointoss with 
probability r oi H for any task j, the component of E[Ri\ contributed by the term corresponding to Xij 
becomes 

E + (1 - X,,)(l - - ^ (,)[//] + (1 - E[X,,])a-Pr,(,)[H])) 

= rp.^w[//] + (1 -r-)(l - irpr,i,m + (1 - r)(l - ^^.(,) [/f]))) 

= 0, 

which is strictly smaller than the reward from f^J = X in © ff qj > 1/2 and ^ > 1, since f{qj,Pi) is 
strictly positive when qj,Pi > 1/2 by Lemma 21 

Since we need ^ < 1 to ensure that (1,X'^) is not a profitable deviation, and ^ > 1 to ensure that 
(0, r) is not a profitable deviation, requiring dij = d simultaneously satisfies both conditions. Therefore, if 
dij = d, deviating from {l,X) on any task j leads to a strict decrease in reward to agent i. Since the total 
reward to agent i can be decomposed into the sum of D terms which each depend only on the report Xij 
and therefore the strategy for the single task j, any deviation from [(1,X)] for any single task or subset of 
tasks strictly decreases i's expected reward. 

Therefore, the rewards Ri = '^^^ always be scaled appropriately to ensure that [(1,X)] is an 

equilibrium of A^d- D 

Other equilibria. While [(1, X)] is an equilibrium. Aid can have other equilibria as well — for instance, the 
strategy [(0, r)], where all agents report the outcome of a random cointoss with bias r on each task, is also 
an equilibrium of Add for all r £ [0, 1], albeit with reward to each agent. In fact, as we show in the next 
theorem, no equilibrium, symmetric or asymmetric, in pure or mixed strategies, can yield higher rewarcH 
to agents than [(1,X)], as long as agents 'treat tasks equally' (for example, while an agent may choose to 
shirk effort on one task and work on all others, each of her tasks is equally likely to be the one she shirks 
on). We will refer to this as tasks being 'apriori equivalent', so that agents cannot distinguish between tasks 
prior to putting in effort on them (or equivalently, the assignment of reference raters is such that agents 
will not find it beneficial (in terms of expected reward) to use a different strategy for a specific task). Note 
that this assumption is particularly reasonable in the context of applications where agents are recruited for 
a collection of similar tasks as in crowdsourced abuse/adult content identification, or in peer grading where 
each task is an anonymous student's solution to the same problem. 



^ We note that another equiUbrium which achieves the same maximum expected reward is [(IjX*^)], where all agents put 
in full effort to make their evaluations, but then all invert their evaluations for their reports. However, is a rather 

unnatural, and risky, strategy, and one that is unlikely to arise in practice. Also, as wc will see later, [(IjX'^)] can also lead to 
lower rewards when there are some agents who always report truthfully. 
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Theorem 7. Suppose pi > 1/2, and tasks are apriori equivalent. Then, the equilibrium where all agents 
choose [{1,X)] yields maximum reward to each agent. 

Proof. Consider a particular agent i and task j, and a single potential reference rater rj(i) for (i, j). Recall 
from the preliminaries that agent i's choice of fij can be described via a matrix M = aiMx + a2Mxc + 
a^Mfj + a^M]^, and that we denote i's evaluation via a vector o, where o = [1 0]"^ if i observes H and 
o = [0 1]"^ if i observes L. Similarly, let us describe rj(i)'s choice of reporting function via the matrix M' 
with corresponding coefficients a[, and denote rj(i)'s evaluation by a' . 

Since tasks arc apriori equivalent, each player i (hence rj{i) too) uses strategies such that E[Xij] = E[Xik\ 
for all J, k £ J{i). Then, we can rewrite the expected reward for agent i on task j, when paired with reference 
rater (i), as 

E[Rij] = 2{E[XijXr^(i)j\ - E[Xij]E[Xr^^^i)j]). 

Using the matrix- vector representation, substituting M, M' with their representations in terms of the basis 
matrices and expanding, and evaluating the matrix-matrix products, we have 

XijXr-{i)j = o'^M'^Mo = o''^Rmo, 

where 

Rm = (aia'i + a2a'2)I + a2a\Mx'= + aia^M^a -f {a^a'^ + 040:4)! + (030'^ + 0402)!/// + (01O3 + 0204)^^ 
+ (0401 + oiza'2)ML + (0104 + 0203)^'/^, 

and /, 1 denote the identity matrix and the matrix of all ones in i?^^^ respectively, and we use M'^Mx = 
Mj^Mx" = I, MJ^Mh = MIMl = 1, M]^,Mh = Ml, M'^Ml = Mh, and MJ^Ml = 0. Similarly, 

E[X,,]E[Xr^(^,)j] = E[o'^M'^]E[Mo] - E[o'^]RmE[o], 

where Rm is as defined above. 

Now, note that Mho = [oi +02 0]^ — [1 0]'^ since oi +02 — 1 for any evaluation vector o by definition, 
so that E[o'^ Mho] = E[o'^]E[Mho], since Mho is a constant. The same is the case for each of the terms 
E[o'^ lo], E[o''^ MJ^o], E[o''^ Ml o], E[o'^ Mlo]. Therefore, these terms cancel out when taking the difference 
E[XijXr.(i)j] — E[Xij]E[X,.^(^i^j] (corresponding to the reward from either agent choosing to report Xij 

independent of her evaluation being 0). Also note that £'[0*-'] — [p[H] p[L]]"^ if agent i makes evaluations 
with proficiency p. Suppose the agents use effort leading to proficiencies p and p' respectively. Then, we 
have 

E[X,,Xr^^,)j] - E[X,J]E[Xr^(^^)J] = (oio'i + 0302) (^^K^o] - E[o'fE[o]) + a2a[{E[o'^Mx^o] - E[o'^]MxcE[ 

+ aia'2{E[o'^ M]^ao] ^ E[o''^]M'^.E[o]) 
= (aia'i + a2a'^)iE[o[oi + 02o'2] - E[o[]E[oi] - E[o2]E[o'2]) 
+ {a2a[ + aia'2){E[o[o2 + oio'j] - E[o[]E[o2] - E[oi]E[o'2]) 

= {aia[ + a2a'2){pp' + (1 - p)(l - p') - p[H]p'[H]- 
{l-p[H]){l-p'[H])')+{a2a[+aia'2){p{l-p') + 
(1 - p)p' - p[H]{l - p'[H]) - (1 - p[H])p'[H]) . 

Now, note that multiplier of {aia[+a2a'2) is precisely f{p,p'), which by Lcmma|3]is nonncgative if > 1/2, 
and strictly positive if p, y > 1/2. Also, forp,p' > 1/2, note that <pa.ndp'[H] < p' . Now, the function 
g{x, y) — x{\ — y)-\-y{\ — x) is decreasing in both x and y for x,y ^ [i, 1] (taking derivatives), so the multiplier 
of (a2ai + aia'2) is non-positive, and negative '\i p,p' > 1/2. 

So the maximum value that E[XijXr-[i)j\ ~ E[Xij\E[Xj..(^i)j\ can take for nonnegative coefficients with 
^ Qfi = ^ = 1, is f{p,p'), which is obtained by setting 03 = 04 = 0, = 04 = {i.e., with no weight on 
random independent reporting), and ai = a'^ = \, a2 = 0/2 = Q (or viceversa): this is because the maximum 
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value of term (aia'i + 020:2) when 02 = 1 — oi and Oj ^ 1 ~ '^'1 is 1 and is achieved with these values, which 
also minimize the value of the term (0201 +0102) with the non-positive multiplier, since (0201 +0102) > 
and is equal to for these values of ai,a'^. Also, since f{p,p') increases with increasing p and p', it is 
maximized when agents put in full effort and achieve their maximum proficiencies pi,Pr (i)- 

Therefore the expected reward for the single component of E[Ri-j\ coming from a specific reference rater 
achieves its upper bound when both agents use [(1, X)]. The same argument applies for each reference rater, 
and therefore to the expected reward E[Rij]^ and establishes the claim. □ 

We next investigate what kinds of Nash equilibria might exist where agents use low effort with any positive 
probability. Apriori, it is reasonable to expect that there would be mixed-strategy equilibria where agents 
randomize between working and shirking, i.e., put in effort (choose Cij = 1) sometimes and not (choose 
Cij = 0) some other times. However, we next show that as long as tasks are apriori equivalent and agents 
only randomize between reporting truthfully and reporting the outcome of an independent random cointoss 
(i.e., they do not invert evaluations), the only equilibrium in which any agent uses any support on (0, r) is 
the one in which all agents always use (0, r) on all their tasks. To show this, we start with the following 
useful lemma saying that an agent who uses a low-effort strategy any fraction of the time will always have 
a beneficial deviation as long as some reference agent plays (1, A') with some positive probability. Roughly 
speaking, this is because as long as there is some probability that an agent's reference rater plays (1,A) 
rather than (0,r), the agent strictly benefits by always playing (1, A) to maximize the probability of both 
agents playing (1, A), which is the only time the agent obtains a positive reward. 

Lemma 8. Suppose the probability of agent i using strategy (1, A) is S and strategy (0,ri) is 1 — (5 for each 
task j G J(i). Suppose i's potential reference raters rj(i) use strategies (I, X) and (0, rr^(i)) with probabilities 
erj(i) o,nd 1 — £rj (i) respectively, for each task j G J{i)- If e^-i^i) > for any reference rater with proficiency 
Pr {i) > 1/2, then agent i has a (strict) profitable deviation to S' = 1, i.e., to always using strategy (1, A), 
for all values of ri G [0,1]. 

Proof. Consider a particular task j, and let fc = 1, . . . , A' be the potential reference rater for {i,j). Let Ok 
denote the probability that k is the reference rater for agent i for task j. By linearity of expectation, i's 
expected reward for j can be written as 

K 

m^A = J2 [s^^ip^pk + (1 - - Pk) - ipmpkiH] + (1 - pm)ii - pO)) 

k-l 

+ {1- 5)ek{r,pk[H] + {1- r,){l-pk[H]) - {r,pk[H] + (1 - r,){l - pk))) 
+ 5{1 - ek){p4H]rk + (1 - rk) - {pr[H]rk + (1 -p4i/l)(l - Tk))) 

+ (1 - S){1 - ek){r,rk + (1 - ri){l - rk) - [r^r^ + (1 - r,){l - rfc)))] 
= 5 J2 dkekip.Pk + (1 - pO(1 - Pk) - {pAH]Pk[H] + (1 - pAH]){1 ~ Pk{H\))) 

k 

= 5^afeefe/i(pi,pfe). 

k 

Now, E\Rij\ is linear in 5, and by Lemma 21 the coefficient of 5 is nonnegative for all and pfe > 1/2, and 
strictly greater than if > for some fc with p^ > 1/2. Therefore, i can strictly increase her expected 
reward E[Rij] by increasing S for any (5 < 1, as long as there is some reference agent k with > and 
Pk > 1/2. 

The same argument holds for each task j G J(i), and therefore to strictly improve i's total reward E[Ri], 
we only need one reference rater across all tasks to satisfy ek > and pk > 1/2 to obtain a strictly beneficial 
deviation (recall that we assumed > 1/2 for all i). □ 

This lemma immediately allows us to show that the only low-effort equilibria of Ai that wc reasonablj0 
need to be concerned about is the pure-strategy equilibrium in which aj = for all i,j. Note that different 
agents could use different (or even rij ) in such equilibria, but all agents will receive reward in all such 
equilibria. 

'^(We say reasonably because of the technical possibility of equilibria where some agents mix over (1, X'^) as well.) 
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Theorem 9. Suppose every agent can be a reference rater with some non-zero probability for every other 
agent, and tasks are apriori equivalent. Then, the only equilibria (symmetric or asymmetric) in which agents 
mix between and any low-effort strategy [(0, r^j)] with non-trivial support on [(0,rij)] are those where 

all agents always use low effort on all tasks. 

Eliminating low-effort equilibria. Our final result uses Lemma |S] to obtain a result about eliminating 
low-efFort equilibria. Suppose there are some trusted agents (for example, an instructor or TA in the peer- 
grading context or workers with long histories of accurate evaluations or good performance in crowdsourcing 
platforms) who always report truthfully with proficiency t > 1/2. Let et denote the minimum probability, 
over all agents i, that the reference rater for agent i is such a trusted agent (note that we can ensure et > 
by having the trusted agent randomly choose each task with positive probability) . Lemma [S] immediately 
gives us the following result for Md, arising from the fact that the reward from playing a random strategy 
(0, r) is exactly — the presence of trusted agents with a non-zero probability, however small, is enough to 
eliminate low-efFort equilibria altogether. 

Theorem 10. Suppose et > 0. Then [(0,r,;j)] is not an equilibrium of M. for any rij G [0, 1]. 

Proof. Suppose all agents except the trusted agent use the strategy (0, rij), and et is the probability that 
the trusted agent is the reference rater for any agent-task pair. Then, since agent i reports Xij according to 
a random coin toss independent of the actual realization of j, the payoff from any reference rater, whether 
the trusted agent or another agent playing (0, r) is 0. For notational simplicity, let r = , r' = ^^^.(i)^. 

E[R,,] = et{rt[H] + (1 - r-)(l - t[H\) - {rt[H] + (1 - r)(l - t[H])) 
+ (1 - et){rr' + (1 - r){l - r') - [rr + (1 - r)(l - /))) 
= 0. 

By deviating to (1,X), agent i can strictly improve her payoff as long as et > and t,p > 1/2, since her 
expected reward from this deviation is 

E[R,,] = et{pt + (1 - p)(l - t) - {p[H]t[H] + (1 - p[H]){l - t[H])) 
+ [l- et){rr' + (1 - r){l - r') - {rr + (1 - r){l - r')) 
>0, 

since the coefficient of et is positive for t,p > 1/2 by Lemma U] Therefore, there is a strictly beneficial 
deviation to {1,X), so there is a choice of multiplier for the reward such that the payoff to agent i, which 
is the difference between the reward and the cost of effort c, is strictly positive as well. So (0, rij) is not an 
equilibrium of Ai when et > 0. 

□ 

This result, while simple, is fairly strong: as long as some positive fraction of the population can be 
trusted to always report truthfully with proficiency greater than 1/2, the only reasonable! equilibrium of 
is the high-effort equilibrium [(1,X)], no matter how small this fraction. In particular, note that A4 does 
not need to assign a higher reward for agreement with a trusted agent to achieve this result, and therefore 
does not need to know the identity of the trusted agents. In contrast, the mechanism which rewards agents 
for agreement with a reference rater without subtracting out our statistic term must use a higher reward 
w(et) for agreement with the trusted agents which increases as to eliminate low-effort equilibri this, 
in addition to being undesirably large, also requires identification of trusted agents. 

5 Creating the Task Assignment 

While in some crowdsourcing settings, agents choose tasks at will, there are also applications where a principal 
can potentially choose an assignment of a collection of her tasks among some assembled pool of workers. In 

^Again, we say reasonable rather than unique because (1, X'^) does remain an cquihbrium of M for all et less than a 
threshold value — however, in addition to being an unnatural and risky strategy, this equilibrium yields strictly smaller payoffs 
than [(1,X)] when et > 0. Note also that the introduction of such trusted agents does not introduce new equilibria, and that 
[(1, X)] remains an equilibrium of M.) 

®The same is the case for a mechanism based on rewarding for the 'right' variance, which does retain [(1, Jf)] as a maximum 
reward equilibrium, but still requires identifying the trusted agents and rewarding extra for agreement with them. 
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this section, we present a simple algorithm to design assignment of tasks to agents such we can satisfy the 
condition in Theorem |6] for mechanism A^£)_i, i.e., when d = D — 1. We note that with this assignment 
of tasks to agents, choosing reference raters appropriately is trivially feasible for d = 1, i.e., for A4i, and 
ensuring dij — d is also easy as described in iJ31 

We start out by randomly permuting all agents using a permutation tt. For simplicity of presentation we 
assume that ^ (= ^) is an integer. The m tasks arc divided into ^ task-blocks, each containing D tasks. 
Similarly, the n agents are divided into T agent blocks, each containing y agents. We number the task-blocks 
by & = 1, . . . , ^ and the agent blocks by a = 1, . . . , T. The agents in block a are thus {a — 1)^ + 1, ... , 
and the tasks in block b are (6 — 1)D -I- 1, ... , bD. 

We first describe the algorithm and then show that it produces an assignment that satisfies the conditions 
required in the definition of Md-i, in particular that for each agent-task pair, it is possible to choose a 
reference rater who has only that task in common with this agent. The algorithm works as follows: we 
assign tasks for agents starting from the agent block a = 1 onwards. For block 1, each agent i' in the block 
is assigned all the tasks corresponding to the task block i' (recall that number of agents in a block equals 
^ — the number of task-blocks). This completes fills up the capacity of the agents in block 1. For blocks 
a = 2, . . . , T, consecutively, the agent {a — 1)^ + i' is assigned D tasks {i' , i' + ^ , . . . ,i' + ^{D — 1)}, for 
i' = I - 

The above assignment completely describes the sets J(i) and for every agent i and task j. For each 
task j, let i* denote the unique agent in block 1 who works on task j. We define the reference raters as 
follows: for each agent-task pair if i lies in blocks {2, . . . ,T}, define the reference rater rj{i) = i*. If 

i lies in block 1, define the reference rater to be any other user who is working on this task. Note that for 
d = D — 1, the sets Stj and Srj[i)j are exactly Sij = J(i) \ {j} and Srj(i)j = Jirj{i)) \ {j}. 

The following lemma proves two things — first, the assignment above is actually feasible under fairly mild 
conditions, and second, that the choice of reference raters satisfies the conditions in the definition of Ad and 
those required by Theorem [6l 

Lemma 11. If m > , the above algorithm generates a feasible assignment, i.e. every agent is assigned 
exactly D tasks and every task to T agents. Also, for agent-task pair the reference rater rj{i) satisfies 

J{rj{i)) nJ{i) = {j}. Furthermore, = -^jiSTij^J- 

Proof. Agents in block 1 are clearly assigned to their full capacity. For blocks a = 2, . . . , T, for every agent 
i = (a — 1)^ + i', the set = {i', i' + ^ , . . . ,i' + ^{D — 1)}. Note that for each i, the above values are all 
distinct, and that i' + ^{D — 1) < ^ + ^{D — 1) = m. Thus every agent's assignment is feasible. Since the 
total capacity of agents equals the total capacity of the tasks, the tasks are also assigned completely, and to 
distinct agents. 

In order to see that the choice of reference raters is feasible, note that if < m, then ^ > D, and 
hence the tasks for each agent belong to distinct blocks. For agent-task pairs {i,j) where the agents are in 
blocks 2, . . . , T, the reference rater rj{i) — i*, the unique agent in block 1 who worked only on the task- block 
that j belongs to. By the above argument, i does not work on any other task from this block, and hence 
J{ij) n J{i) = {j}. By the same argument, i is also a feasible reference rater for i* on task j. Thus, the 
choice of reference raters satisfies the condition for A4d-i. 

Finally, the expectation condition follows simply from the random permutation applied to the set of 
agents at the beginning of the construction. 

□ 

6 Discussion 

In this paper, we introduced the problem of information elicitation when agents' proficiencies are endoge- 
nously determined as a function of their effort, and presented a simple mechanism which uses the presence 
of multiple tasks to identify and penalize low-effort agreement to incentivize effort when tasks have binary 
types. Our mechanism has the property that maximum effort followed by truthful reporting is the Nash 
equilibrium with maximum payoff to all agents, including mixed strategy equilibria. In addition to handling 
endogenous agent proficiencies, to the best of our knowledge this is the first mechanism for information 
elicitation with this 'best Nash equilibrium' property over all pure and over mixed strategy equilibria that 
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requires agents to only report their own evaluations {i.e., without requiring 'prediction' reports of their 
behefs about other agents' reports), and does not impose any requirement on a diverging number of agent 
reports per task to achieve its incentive properties. Our mechanism provides a starting point for designing 
information ehcitation mechanisms for several crowdsourcing settings where proficiency is an endogenous, 
effort-dependent choice, such as image labeling, tagging, and peer grading in online education. 

We use the simplest possible model that captures the complexities arising from strategically determined 
agent proficiencies, leading to a number of immediate directions for further work. First, our underlying out- 
come space is binary [H or L) — modeling and extending the mechanism to allow a richer space of outcomes 
and feedback is one of the most immediate and challenging directions for further work. Also, our model of 
effort is binary, where agents either exert full effort and achieve maximum proficiency, or exert no effort to 
achieve the baseline proficiency. While our results extend to a model where proficiency increases linearly 
with cost, a natural question is how they extend to more general models, for example, with convex costs. 
Finally, a very interesting direction is that of heterogenous tasks with task-specific priors and abilities. In 
our model, tasks are homogenous with the same prior 'P[i?], and agents have the same cost and maximum 
proficiency for each task. If tasks differ in difficulty, and agents can observe the difficulty of a task prior 
to putting in effort, there are clear incentives to shirk on harder tasks while putting in effort for the easier 
ones. While tasks are indeed apriori homogenous (or can be partitioned to be so) in some crowdsourcing 
settings, there are other applications where some tasks are clearly harder than others; also, agents may 
have task-specific abilities. Designing mechanisms with strong incentive properties for this setting is a very 
promising and important direction for further work. 
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